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1.  INTRODUCTION 


Let  G  =  (V,£)  be  a  connected  graph  with  a  set  of  vertices  V  and  a  set  of  edges  E  in  which  each 
edge  e  has  a  weight  w(e).  Without  loss  of  generality,  assume  that  the  edge  weights  are  distinct 
Hence  the  minimum  spanning  tree  (MST)  of  G  is  unique.  Let  n  =  I V I  and  m  =  \E  I.  We  present  an 
algorithm  for  finding  the  MST  of  G  on  a  Common  CRCW  PRAM  using  2 m+n 1+2e  processors  and 


0(m  +«I+€)  space  in  0(log  n)  time,  where  e  is  a  constant  such  that  0  <  e  ^  1/2. 


t  ft  '  /*!  U  iri)  •.  r  z*1  •  n  ( 

The  MSrF problerrTis  ah  lmpbrtari 


problem  is  an  important  problem  of  combinatorial  optimization.  Some  practical  appli¬ 
cations  of  MST’s  include  the  design  of  computer,  communication,  and  transportation  networks.  Gra¬ 
ham  and  HelUTFj'gave  an  extensive  history  of  the  MST  problem.^ 

Yao  -{43J  and  Cheriton  and  Taijan  -{4]pdesigned  sequential  MST  algorithms  that  ran  in  time 
0(m  log  log  n). ^  Fredman  and  Taijan  [9]  gave  an  improved  algorithm  which  ran  in  time  0(m  P(m  ,n )), 

where ._P(/n ,/jX=  min  {illog(,)/i  £  — }.  If  mtn,  then  P(m ,n)  £  log*n.  Gallager  et  al.  [10] 

n 

presented  a  distributed  MST  algorithm  which  used  at  most  5/iIog/i  +  2m  messages  and  0(/z  log  n) 
/ - 

time.  Awerbuch  •fH^presented  an  optimal  distributed  MST  algorithm  that  required  O (m+n  log  n)  mes¬ 
sages  and  0(/i)  time 


a 


There  have  also  been  several  parallel  MST  algorithms.  \  Chin  et  al.  [5]  presented  an  efficient 


algorithm  for  the  CREW  PRAM  which  ran  in  time  O(lo, 


og $n)  using  O  — 


lOg2/! 


processors.  Hirschberg 


[12]  gave  an  algorithm  for  the  Conynon  CRCW  PRAM  which  ran  in  time  0(log  n)  using  n 3  proces¬ 
sors.  Awerjywh  tfhdlShiloach  [2]  designed  an  algorithm  for  the  Priority  CRCW  PRAM  which  ran  in 
tinjirCflog  rt)  using  m+n  processors  and  0(m+n)  space.  /r  .  .  ,  /rt, 

In  this  paper,  we  employ  some  of  the  results  of  Fich  et  al.  [1]  and  modifjMhr'aigorithm  of-[2^te 
obtain  a  Common  CRCW  PRAM  MST  algorithmr^A  straightforward  modification  yields  an  algorithm 


that  runs  in  time  0(log  n )  using  mn  +  n 1+2c  proce^ors.  We  then  reduce  the  number  of  processors  to 


2 


2m  +  n 1+2£.  The  amount  of  space  used  by  our  algorithm  is  0(m+n1+€).  Our  algorithm  has  the  same 
running  time  as  the  algorithm  of  [12]  and  uses  fewer  processors.  For  mildly  dense  graphs,  where 
m  =  £l(rt 1+2e),  our  algorithm  has  the  same  performance  as  the  algorithm  of  [2]  and  uses  a  weaker 
CROW  PRAM  model.  Boppana  [3]  and  Fich,  et  al.  [8]  established  that  the  time  separation  between 


the  Priority  PRAM  and  the  Common  PRAM  each  with  p  processors  is  © 


Jo %P 


log  log  p 


In  Section  2,  we  describe  the  model  of  computation.  In  Section  3,  we  review  the  MST  algorithm 
of  [2].  In  Section  4,  we  present  the  results  of  [7]  that  apply  to  our  algorithm.  In  Section  5,  we 
describe  the  modification  of  the  algorithm  of  [2]  to  obtain  our  Common  CRCW  PRAM  algorithm. 


2.  MODEL  OF  COMPUTATION 

A  CRCW  PRAM  consists  of  a  set  of  processors  and  a  shared  memory.  Each  step  consists  of 
three  phases.  In  the  first  phase,  each  processor  may  read  one  shared  memory  cell.  In  the  second 
phase,  each  processor  may  perform  local  computations.  In  the  third  phase,  each  processor  may  write 
into  one  shared  memory  cell.  Any  number  of  processors  may  simultaneously  read  from  a  memory 
cell.  If  more  than  one  processor  simultaneously  writes  into  the  same  memory  cell,  then  the  value  that 
is  written  depends  on  the  model. 

Two  CRCW  models  are  the  Priority  model  and  the  Common  model.  In  the  Priority  model,  each 
processor  is  assigned  a  unique  priority.  If  more  than  one  processor  tries  to  write  into  the  same  cell, 
then  the  processor  with  the  highest  priority  is  the  one  that  succeeds.  In  the  Common  model,  if  more 
than  one  processor  tries  to  write  into  the  same  cell,  then  all  the  processors  must  write  the  same  value. 

3.  THE  MST  ALGORITHM  OF  AWERBUCH  AND  SHILOACH 


The  algorithm  of  Awerbuch  and  Shiloach  [2]  uses  a  Priority  CRCW  PRAM.  The  priority  of  each 
processor  is  determined  by  its  index.  The  smaller  the  index,  then  the  higher  its  priority.  The  algo¬ 
rithm  assigns  processors  to  edges  such  that  the  smaller  the  weight  of  an  edge,  the  higher  the  priority  of 
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the  corresponding  processor.  The  assignment  can  be  made  by  sorting  the  edges  by  weight  and  then 
assigning  processors  in  order.  This  can  be  done  in  0(log  n)  time  using  the  parallel  merge  sort  algo¬ 
rithm  of  Cole  [6].  Let  p  (i  ,j )  be  the  processor  assigned  to  edge  (ij). 

A  rooted  tree  is  a  tree  whose  edges  are  directed  toward  the  root.  A  star  is  a  rooted  tree  with 
height  1.  Assume  the  vertices  of  G  are  numbered  from  1  to  n.  The  number  of  a  vertex  is  its  id.  In 
the  algorithm,  there  are  variables  associated  with  each  vertex  i.  We  will  use  the  name  of  a  vertex  to 
refer  to  a  variable  associated  with  that  vertex.  The  processors  that  operate  on  these  variables,  however, 
correspond  to  edges. 

Each  vertex  i  has  a  parent  P(i),  which  is  either  another  vertex  or  itself.  If  a  vertex  is  a  root, 
then  its  parent  is  itself.  Tne  parent-child  relation  defines  a  directed  graph  called  the  parent's  graph, 
PG .  PG  has  the  same  vertices  as  G .  Define  GP  (i )  =  P(P(i)),  and  call  GP (i )  the  grar  Jparent  of  i. 

The  algorithm  maintains  a  set  T  of  undirected  edges  which  always  forms  a  forest  of  the  MST. 
The  algorithm  adds  edges  to  T  using  the  property  that  for  any  subset  of  vertices,  the  edge  of  least 
weight  leaving  the  set  must  belong  to  the  MST.  T  grows  until  it  becomes  the  MST. 

The  algorithm  maintains  the  invariant  that  after  each  iteration,  for  each  directed  tree  in  PG ,  there 
is  a  subtree  in  T  spanning  the  same  set  of  vertices.  The  algorithm  finds  edges  of  the  MST  by  trying  to 
hook  stars  to  other  trees  in  PG .  Processors  that  correspond  to  edges  leaving  a  star  try  to  hook  the  star 
to  a  tree.  Edges  that  correspond  to  successful  processors  are  added  to  T.  After  the  stars  are  hooked, 
the  algorithm  reduces  the  height  of  each  tree  with  a  shortcut  operation,  where  each  vertex  takes  its 
grandparent  to  be  its  new  parent. 

T(e)  is  a  boolean  variable  attached  to  each  edge  e.  T(e)  is  initially  0.  During  the  algorithm,  if 
edge  e  is  added  to  the  T,  then  T(e)  is  set  to  1.  WINNER (i)  contains  the  name  of  the  edge 
corresponding  to  the  writing  processor.  After  the  initialization,  the  algorithm  iterates  three  steps  until 
all  the  vertices  are  in  the  same  star.  The  algorithm  is  executed  in  parallel  by  each  edge  processor 
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p  (».;')• 

Priority  CRCW  PRAM  Algorithm 
Initialization: 

T(e)  :=  0  for  all  e  e  E  - 

P(i)  :=  i  for  i  =s  1,  •  •  • ,  n 

repeat 

Step  1:  (Star  hooking) 

If  i  belongs  to  a  star  and  P(i)  *  P(j)  then 
/»(/>(/))  :=  P(J)  and  WINNERS (0)  :=  (ij) 

If  WINNERS  (0)  =  OV)  then  r(i,;)  :=  1 

Step  2:  (Cycle  breaking) 

If  i  <  P(i)  and  i  =  GP (i )  then  P(i)  :=  i 

Step  3:  (Shortcut  operation) 

P(i)  :=  GP(i) 

until  every  vertex  i  belongs  to  the  same  star 

Step  1  performs  the  hooking  operation.  Processors  that  correspond  to  edges  leaving  a  star  try  to 
hook  the  star  to  another  tree.  A  star  is  hooked  to  a  tree  by  assigning  the  root  of  the  star  a  parent  that 
is  a  vertex  of  the  tree  to  which  the  star  is  being  hooked.  If  more  than  one  processor  tries  to  hook  the 
star,  then  the  processor  with  the  highest  priority  succeeds.  WINNER(i)  contains  the  name  of  the  edge 
e  corresponding  to  the  writing  processor.  Since  edge  e  belongs  to  the  MST,  the  algorithm  sets  T(e) 
:=  1.  After  Step  1,  every  star  is  hooked  to  some  tree. 

Step  2  eliminates  any  cycles  that  may  have  been  formed  in  the  parent’s  graph.  A  cycle  of  length 
two  forms  when  an  edge’s  endpoints  belong  to  two  different  stars  and  the  edge  is  the  edge  of  least 
weight  leaving  both  stars.  To  break  a  cycle,  the  algorithm  changes  the  parent  pointer  of  the  vertex 
with  the  smaller  id  to  point  to  itself. 

Step  3  performs  the  shortcut  operation.  For  each  vertex  i ,  the  algorithm  sets  the  grandparent  of  i 
to  be  the  new  parent  of  i.  Note  that  if  more  than  one  processor  updates  /*(£),  then  the  processors  per¬ 
form  a  common  write  operation.  The  height  of  each  tree  that  is  not  a  star  decreases  a  factor  of  at  least 
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3/2. 

A  vertex  determines  whether  it  belongs  to  a  star  by  using  Procedure  Star_Check.  At  the  termina¬ 
tion  of  Star_Check,  if  ST(i)  is  true  (false),  then  i  belongs  (does  not  belong)  to  a  star. 

Procedure  Star_Check 
5T(i)  :=  true 

If  P(i)  *  GP(i)  then  ST(i)  :=  false  and  ST(GP(i))  :=  false 
ST(i)  :=  ST(P(i )) 

Awexbuch  and  Shiloach  [2]  established  the  correctness  of  their  algorithm.  We  briefly  justify  the 
running  time.  Consider  each  iteration  of  the  three  steps.  Steps  1  and  2  ensure  that  every  star  is 
hooked  to  some  tree  to  yield  a  new  tree  with  height  greater  than  one.  Since  Step  3  reduces  the  height 
of  every  tree  with  height  greater  than  one  by  a  factor  of  at  least  3/2,  the  sum  of  the  heights  of  all  the 
trees  present  at  the  start  of  the  iteration  is  reduced  by  a  factor  of  at  least  3/2.  Thus  0(log  n )  iterations 
yield  a  single  star.  Since  each  iteration  takes  0(1)  time,  the  algorithm  runs  in  time  0(log  n). 

4.  r -COLOR  MINIMIZATION  PROBLEM 

We  obtain  a  Common  CRCW  PRAM  MST  algorithm  by  modifying  the  implementation  of  Step  1 
of  the  algorithm  of  Awerbuch  and  Shiloach.  Only  Step  1  uses  a  priority  write  operation.  In  our  algo¬ 
rithm,  we  avoid  the  priority  write  by  determining  the  processor  of  highest  priority  wanting  to  write  to 
each  memory  cell  and  having  only  those  processors  write.  It  can  be  seen  that  the  values  written  in  the 
memory  cells  are  the  same  as  those  that  would  have  been  written  in  the  Priority  CRCW  PRAM  model. 

To  determine  the  processor  of  highest  priority  writing  to  each  cell,  we  solve  a  special  case  of  the 
r  -color  minimization  problem  described  in  Fich  et  al.  [7], 
r-Color  Minimization  Problem 

Before:  Each  processor  p|t  i  =  1,  ••  ■ ,  p,  has  a  color  x,,  0  £x,  £  r  ,  known  only  to  itself,  x, 
represents  the  cell  /?,  wants  to  write,  if  any,  and  0  otherwise. 

After  Each  processor  pt  knows  the  value  ait  where  a,-  =  1  if  and  only  if  pt  is  the  processor  of 
lowest  index  writing  to  the  cell  represented  by  x, . 
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For  our  algorithm,  we  consider  the  case  where  r  =  1.  Fich  et  al.  showed  that  on  a  Common 


CRCW  PRAM  with  k  memory  cells  the  1 -color  minimization  problem  can  be  solved  in  0 


J°  %P 


log  Ok +1) 

steps.  In  our  discussion,  we  present  a  simplified  variation  of  their  method  and  show  how  the  problem 


can  be  solved  in  O 


*og  P 


log  k 


steps. 


Let  Mi,  •  •  • ,  Mk  be  the  k  memory  cells.  Assume  without  loss  of  generality  that  k  £  p£,  where 
e  is  a  consant  such  that  0  <  e  ^  1/2.  If  k  >  p 1/2,  then  only  the  first  p 1/2  cells  are  needed  to  achieve 
0(1)  steps. 

The  algorithm  iterates  the  following  steps.  Processor  />,  ,  i  =  1,  •  •  •  ,  k,  writes  0  into  Af,.  The 
processors  are  then  divided  into  k  groups  of  nearly  equal  size,  where  each  group  is  a  set  of  consecu¬ 


tively  numbered  processors.  The  first  p  mod  k  groups  contain 


processors,  and  the  remaining 


groups  contain 


processors.  A  processor  p,  in  the  y'th  group,  1  <  j  <  k,  writes  1  into  Mj  if  and 


only  if*,  =  1. 

The  winner  is  the  processor  of  smallest  index  with  *,  =  1.  Thus  the  winner  is  in  the  group 
corresponding  to  the  Mj  of  smallest  index  containing  1.  The  algorithm  determines  the  winning  group 
by  using  the  subroutine  Leftmost  One. 

Leftmost  One 

Before:  Cells  Mit  i  =  1,  ■■■ ,  k,  each  contain  0  or  1. 

After  Mi  contains  1  if  and  only  if  all  Mj  for  j  <  i  were  initially  0,  and  A/,  was  initially  1. 

The  Leftmost  One  algorithm  compares  ail  pairs  of  cells  Mi  and  Mj,  1  <>  i,j  <  k.  If  j  <  i  and 
Mi  and  Mj  both  contain  1,  then  the  algorithm  writes  0  into  A/,  .  The  algorithm  requires  k2  <  p  proces¬ 
sors.  After  applying  the  Leftmost  One  subroutine,  processors  in  group  j  read  Mj.  A  group  deter- 
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mines  it  is  the  winning  group  if  its  processors  read  a  1. 

All  processors  that  are  not  in  the  winning  group  set  at  :=  0  and  stop.  The  processors  in  the  win¬ 
ning  group  then  repeat  the  1 -color  minimization  algorithm.  This  process  repeats  until  the  winning 
group  contains  only  one  processor,  the  winner. 

Each  iteration  of  the  1-color  minimization  algorithm  reduces  the  number  of  processors  that  may 


be  the  winner  by  a  factor  of  k.  Thus  the  winner  is  determined  in  at  most 


log*  p 


iterations.  Since 


each  iteration  takes  0(1)  steps,  the  winner  is  determined  in  O 


log  p 


log  k 


steps. 


5.  COMMON  CRCW  PRAM  MST  ALGORITHM 

Our  Common  CRCW  PRAM  MST  algorithm  is  the  same  as  the  Priority  CRCW  PRAM  algo¬ 
rithm  of  Awerbuch  and  Shiloach  except  that  Step  1  is  modified  to  eliminate  the  priority  concurrent 
write.  Thus  we  describe  the  modified  implementation  of  Step  1  only. 

In  Step  1  of  the  Priority  algorithm,  if  more  than  one  processor  tries  to  hook  a  star  with  root  i  to 
a  tree,  then  a  priority  write  of  the  variable  P(i)  occurs.  Since  there  is  a  P(i)  for  each  vertex  i,  there 
are  n  cells  into  which  processors  may  write.  The  P(i)’s  are  written  by  processors  performing  the 
hooking  operation.  Since  processors  performing  the  hooking  operation  correspond  to  edges  leaving 
stars,  as  many  as  m  processors  may  want  to  write  into  one  P(i). 

In  the  Common  algorithm,  we  first  deteimine  the  processor  of  highest  priority  writing  to  each 
P(i)  and  then  have  only  that  processor  write.  We  begin  with  the  direct  implementation  which  requires 
solving  the  r  -color  minimization  problem  with  m  processors  and  n  colors. 

To  maintain  the  0(log  n)  running  time  of  the  MST  algorithm.  Step  1  must  run  in  time  0(1).  In 
Step  1,  the  Common  PRAM  algorithm  simultaneously  solves  n  1 -color  minimization  problems,  one  for 
each  />(/),  using  the  algorithm  of  Section  4.  Each  problem  requires  nE  cells  and  n2*  processors  to 
obtain  an  0(1)  time  solution.  During  the  first  iteration,  m  processors  are  divided  into  nz  groups. 


* 


During  each  iteration,  a  processor  can  determine  the  group  to  which  it  belongs  since  it  knows  its  rank 
from  the  sort  performed  during  the  initialization  phase.  Each  iteration  reduces  the  number  of  contend¬ 


ing  processors  by  a  factor  of  n£,  and  thus  O  — ^  m  =  0(1)  iterations  suffice.  Since  there  are  n 

[Iog/t£; 

problems.  Step  1  requires  a  total  of  m/i+nI+2£  processors  and  nl4€  cells.  We  now  show  how  to  reduce 
the  number  of  processors. 


In  Step  1  of  the  Priority  algorithm,  each  processor  corresponding  to  an  edge  leaving  a  star  writes 
to  exactly  one  P(i).  Thus  in  the  Common  algorithm,  each  processor  wanting  to  write  is  a  possible 
winner  for  only  one  of  the  n  1 -color  minimization  problems. 


The  absence  of  non-writing  processors  from  the  groups  of  processors  formed  during  the  solution 
of  the  1 -color  minimization  problem  does  not  affect  the  outcome  since  the  processors  would  not  have 
written  even  if  they  were  present.  Thus  each  processor  that  wants  to  write  needs  only  to  participate  in 
the  solution  of  the  1-color  minimization  problem  corresponding  to  the  /*(/)  it  wants  to  write.  Hence, 
for  the  n  1-color  minimization  problems,  the  algorithm  requires  a  total  of  m+n1+2e  processors. 


The  remaining  steps  of  the  algorithm  require  2 m+n  processors  and  0(m  +n )  space.  Thus  we 
have  a  Common  CRCW  PRAM  algorithm  for  the  MST  problem  that  runs  in  time  0(log  n)  using 
2 m+n1+2£  processors  and  0(m  +n 1+€)  space.  For  graphs  that  are  not  connected,  the  algorithm  can  be 
easily  modified  to  find  a  minimum  weight  spanning  forest,  and  thus  the  connected  components. 
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